Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
Bibletts是一种在撒哈拉以南非洲使用的十种语言的大型,高质量的开放语音数据集。该语料库包含每语言最多86个小时的对齐,工作室质量的48kHz单扬声器唱片,从而能够开发高质量的文本到语音模型。代表的十种语言是:Akuapem Twi,Asante Twi,Chichewa,Ewe,Hausa,Kikuyu,Lingala,Luganda,Luganda,Luo和Yoruba。该语料库是由Biblica的Open.Bible Project制作和发行的圣经录音的衍生作品。我们已经对齐,清洁和过滤了原始录音,并还对每种语言的对齐子进行了手工检查。我们为具有Coqui TTS的文本到语音模型提供了结果。数据是根据商业友好的CC-SA许可发布的。
translated by 谷歌翻译
语言模型预训练的最新进展利用大规模数据集创建多语言模型。但是,这些数据集中大多遗漏了低资源语言。这主要是因为网络上没有很好地表示口语,因此被排除在用于创建数据集的大规模爬网中。此外,这些模型的下游用户仅限于最初选择用于预训练的语言的选择。这项工作调查了如何最佳利用现有的预培训模型来为16种非洲语言创建低资源翻译系统。我们关注两个问题:1)如何将预训练的模型用于初始预培训中未包含的语言? 2)生成的翻译模型如何有效地转移到新域?为了回答这些问题,我们创建了一个新的非洲新闻语料库,涵盖16种语言,其中8种语言不属于任何现有评估数据集的一部分。我们证明,将两种语言转移到其他语言和其他领域的最有效策略是,以少量的高质量翻译数据微调大型预训练模型。
translated by 谷歌翻译
We propose a new causal inference framework to learn causal effects from multiple, decentralized data sources in a federated setting. We introduce an adaptive transfer algorithm that learns the similarities among the data sources by utilizing Random Fourier Features to disentangle the loss function into multiple components, each of which is associated with a data source. The data sources may have different distributions; the causal effects are independently and systematically incorporated. The proposed method estimates the similarities among the sources through transfer coefficients, and hence requiring no prior information about the similarity measures. The heterogeneous causal effects can be estimated with no sharing of the raw training data among the sources, thus minimizing the risk of privacy leak. We also provide minimax lower bounds to assess the quality of the parameters learned from the disparate sources. The proposed method is empirically shown to outperform the baselines on decentralized data sources with dissimilar distributions.
translated by 谷歌翻译
In optimization-based approaches to inverse problems and to statistical estimation, it is common to augment the objective with a regularizer to address challenges associated with ill-posedness. The choice of a suitable regularizer is typically driven by prior domain information and computational considerations. Convex regularizers are attractive as they are endowed with certificates of optimality as well as the toolkit of convex analysis, but exhibit a computational scaling that makes them ill-suited beyond moderate-sized problem instances. On the other hand, nonconvex regularizers can often be deployed at scale, but do not enjoy the certification properties associated with convex regularizers. In this paper, we seek a systematic understanding of the power and the limitations of convex regularization by investigating the following questions: Given a distribution, what are the optimal regularizers, both convex and nonconvex, for data drawn from the distribution? What properties of a data source govern whether it is amenable to convex regularization? We address these questions for the class of continuous and positively homogenous regularizers for which convex and nonconvex regularizers correspond, respectively, to convex bodies and star bodies. By leveraging dual Brunn-Minkowski theory, we show that a radial function derived from a data distribution is the key quantity for identifying optimal regularizers and for assessing the amenability of a data source to convex regularization. Using tools such as $\Gamma$-convergence, we show that our results are robust in the sense that the optimal regularizers for a sample drawn from a distribution converge to their population counterparts as the sample size grows large. Finally, we give generalization guarantees that recover previous results for polyhedral regularizers (i.e., dictionary learning) and lead to new ones for semidefinite regularizers.
translated by 谷歌翻译
Automatic segmentation of kidney and kidney tumour in Computed Tomography (CT) images is essential, as it uses less time as compared to the current gold standard of manual segmentation. However, many hospitals are still reliant on manual study and segmentation of CT images by medical practitioners because of its higher accuracy. Thus, this study focuses on the development of an approach for automatic kidney and kidney tumour segmentation in contrast-enhanced CT images. A method based on Convolutional Neural Network (CNN) was proposed, where a 3D U-Net segmentation model was developed and trained to delineate the kidney and kidney tumour from CT scans. Each CT image was pre-processed before inputting to the CNN, and the effect of down-sampled and patch-wise input images on the model performance was analysed. The proposed method was evaluated on the publicly available 2021 Kidney and Kidney Tumour Segmentation Challenge (KiTS21) dataset. The method with the best performing model recorded an average training Dice score of 0.6129, with the kidney and kidney tumour Dice scores of 0.7923 and 0.4344, respectively. For testing, the model obtained a kidney Dice score of 0.8034, and a kidney tumour Dice score of 0.4713, with an average Dice score of 0.6374.
translated by 谷歌翻译
The Coronavirus disease 2019 (COVID-19) was first identified in Wuhan, China, in early December 2019 and now becoming a pandemic. When COVID-19 patients undergo radiography examination, radiologists can observe the present of radiographic abnormalities from their chest X-ray (CXR) images. In this study, a deep convolutional neural network (CNN) model was proposed to aid radiologists in diagnosing COVID-19 patients. First, this work conducted a comparative study on the performance of modified VGG-16, ResNet-50 and DenseNet-121 to classify CXR images into normal, COVID-19 and viral pneumonia. Then, the impact of image augmentation on the classification results was evaluated. The publicly available COVID-19 Radiography Database was used throughout this study. After comparison, ResNet-50 achieved the highest accuracy with 95.88%. Next, after training ResNet-50 with rotation, translation, horizontal flip, intensity shift and zoom augmented dataset, the accuracy dropped to 80.95%. Furthermore, an ablation study on the effect of image augmentation on the classification results found that the combinations of rotation and intensity shift augmentation methods obtained an accuracy higher than baseline, which is 96.14%. Finally, ResNet-50 with rotation and intensity shift augmentations performed the best and was proposed as the final classification model in this work. These findings demonstrated that the proposed classification model can provide a promising result for COVID-19 diagnosis.
translated by 谷歌翻译
With more and more data being collected, data-driven modeling methods have been gaining in popularity in recent years. While physically sound, classical gray-box models are often cumbersome to identify and scale, and their accuracy might be hindered by their limited expressiveness. On the other hand, classical black-box methods, typically relying on Neural Networks (NNs) nowadays, often achieve impressive performance, even at scale, by deriving statistical patterns from data. However, they remain completely oblivious to the underlying physical laws, which may lead to potentially catastrophic failures if decisions for real-world physical systems are based on them. Physically Consistent Neural Networks (PCNNs) were recently developed to address these aforementioned issues, ensuring physical consistency while still leveraging NNs to attain state-of-the-art accuracy. In this work, we scale PCNNs to model building temperature dynamics and propose a thorough comparison with classical gray-box and black-box methods. More precisely, we design three distinct PCNN extensions, thereby exemplifying the modularity and flexibility of the architecture, and formally prove their physical consistency. In the presented case study, PCNNs are shown to achieve state-of-the-art accuracy, even outperforming classical NN-based models despite their constrained structure. Our investigations furthermore provide a clear illustration of NNs achieving seemingly good performance while remaining completely physics-agnostic, which can be misleading in practice. While this performance comes at the cost of computational complexity, PCNNs on the other hand show accuracy improvements of 17-35% compared to all other physically consistent methods, paving the way for scalable physically consistent models with state-of-the-art performance.
translated by 谷歌翻译
Automotive radar sensors provide valuable information for advanced driving assistance systems (ADAS). Radars can reliably estimate the distance to an object and the relative velocity, regardless of weather and light conditions. However, radar sensors suffer from low resolution and huge intra-class variations in the shape of objects. Exploiting the time information (e.g., multiple frames) has been shown to help to capture better the dynamics of objects and, therefore, the variation in the shape of objects. Most temporal radar object detectors use 3D convolutions to learn spatial and temporal information. However, these methods are often non-causal and unsuitable for real-time applications. This work presents RECORD, a new recurrent CNN architecture for online radar object detection. We propose an end-to-end trainable architecture mixing convolutions and ConvLSTMs to learn spatio-temporal dependencies between successive frames. Our model is causal and requires only the past information encoded in the memory of the ConvLSTMs to detect objects. Our experiments show such a method's relevance for detecting objects in different radar representations (range-Doppler, range-angle) and outperform state-of-the-art models on the ROD2021 and CARRADA datasets while being less computationally expensive. The code will be available soon.
translated by 谷歌翻译
Deep learning models are being increasingly applied to imbalanced data in high stakes fields such as medicine, autonomous driving, and intelligence analysis. Imbalanced data compounds the black-box nature of deep networks because the relationships between classes may be highly skewed and unclear. This can reduce trust by model users and hamper the progress of developers of imbalanced learning algorithms. Existing methods that investigate imbalanced data complexity are geared toward binary classification, shallow learning models and low dimensional data. In addition, current eXplainable Artificial Intelligence (XAI) techniques mainly focus on converting opaque deep learning models into simpler models (e.g., decision trees) or mapping predictions for specific instances to inputs, instead of examining global data properties and complexities. Therefore, there is a need for a framework that is tailored to modern deep networks, that incorporates large, high dimensional, multi-class datasets, and uncovers data complexities commonly found in imbalanced data (e.g., class overlap, sub-concepts, and outlier instances). We propose a set of techniques that can be used by both deep learning model users to identify, visualize and understand class prototypes, sub-concepts and outlier instances; and by imbalanced learning algorithm developers to detect features and class exemplars that are key to model performance. Our framework also identifies instances that reside on the border of class decision boundaries, which can carry highly discriminative information. Unlike many existing XAI techniques which map model decisions to gray-scale pixel locations, we use saliency through back-propagation to identify and aggregate image color bands across entire classes. Our framework is publicly available at \url{https://github.com/dd1github/XAI_for_Imbalanced_Learning}
translated by 谷歌翻译